{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "tags": [
     "remove-cell"
    ]
   },
   "outputs": [],
   "source": [
    "import warnings\n",
    "# Ignore numpy dtype warnings. These warnings are caused by an interaction\n",
    "# between numpy and Cython and can be safely ignored.\n",
    "# Reference: https://stackoverflow.com/a/40846742\n",
    "warnings.filterwarnings(\"ignore\", message=\"numpy.dtype size changed\")\n",
    "warnings.filterwarnings(\"ignore\", message=\"numpy.ufunc size changed\")\n",
    "\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import pandas as pd\n",
    "import seaborn as sns\n",
    "%matplotlib inline\n",
    "import ipywidgets as widgets\n",
    "from ipywidgets import interact, interactive, fixed, interact_manual\n",
    "\n",
    "sns.set()\n",
    "sns.set_context('talk')\n",
    "np.set_printoptions(threshold=20, precision=2, suppress=True)\n",
    "pd.options.display.max_rows = 7\n",
    "pd.options.display.max_columns = 8\n",
    "pd.set_option('precision', 2)\n",
    "# This option stops scientific notation for pandas\n",
    "# pd.set_option('display.float_format', '{:.2f}'.format)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Exercises\n",
    "\n",
    "- Another loss function called the Huber loss combines the absolute and\n",
    "  squared loss to create a loss function that is both smooth and robust\n",
    "  to outliers. The Huber loss accomplishes this by behaving like the squared loss\n",
    "  for $\\theta$ values close to the minimum and switching to absolute loss for\n",
    "  $\\theta$ values far from the minimum. Below is a formula for a simplified\n",
    "  version of Huber loss. Use this definition of Huber loss to\n",
    "   - Write a function called `mhe` to compute the mean Huber error.\n",
    "   - Plot the smooth `mhe` curve for the bus times data where $\\theta$ ranges from -2\n",
    "     to 8.\n",
    "   - Use trial and error to find the minimizing $\\hat \\theta$ for bus times.\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "l(\\theta, y)\n",
    "&= \\frac{1}{2} (y - \\theta)^2  &\\textrm{for}~ |y-\\theta| \\leq 2\\\\\n",
    "&= 2(|y - \\theta| - 1)  &\\textrm{otherwise.}\\\\\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "- Continue with Huber loss and the function `mhe` in the previous problem:\n",
    "   - Plot the smooth `mhe` for the five data points $[-2, 0, 1, 5, 10]$.\n",
    "   - Describe the curve. \n",
    "   - For these five points, what is the minimizing $\\hat \\theta$? \n",
    "   - What happens when the data point 10 is swapped for 100? Compare the minimizer to the\n",
    "     mean and median of the five points."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Consider a loss function that has 0 loss for negative values of $y$ and quadratic loss for positive $y$. \n",
    "    - Write a function, called `m0e` that computes the average loss for this function.\n",
    "    - Plot the `m0e` curve for many $\\theta$s given the data  $\\mathbf{y} = [-2, 0, 1, 5, 10]$\n",
    "    - Use trial and error to find the minimizing $\\hat \\theta$.\n",
    "    - Intuitively, what should the minimizing value be? What if we use linear loss instead? "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- In this exercise, we again show that the mean minimizes the mean square error, but we will use calculus instead.\n",
    "   -  Take the derivative of the average loss with respect to $\\theta$.\n",
    "   - Set the derivative to 0 and solve for $\\hat{\\theta}$.\n",
    "   - To be thorough, take a second derivative to confirm that $\\bar{y}$ is a minimizer. (Recall that if the second derivative is positive than the quadratic is concave.)  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Follow the steps below to establish that MAE is minimized for the median. \n",
    "   - Split the summation, $\\frac{1}{n} \\sum_{i = 1}^{n}|y_i - \\theta|$ into\n",
    "     three terms for when $y_i - \\theta$ is negative, 0, and positive. \n",
    "   - Set the middle term to 0 so that the equations are easier to work with.\n",
    "     Use the fact that the derivative of the absolute value is -1 or +1 to\n",
    "     differentiate the remaining two terms with respect to $\\theta$. \n",
    "   - Set the derivative to 0 and simplify terms. Explain why when there are an\n",
    "     odd number of points, the solution is the median.\n",
    "   - Explain why when there are an even number of points, the minimizing\n",
    "     $\\theta$ is not uniquely defined (just as with the median). "
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Tags",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.4"
  },
  "toc": {
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": true,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
